Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker support for pipeline runs #201

Merged
merged 21 commits into from
Feb 14, 2020

Conversation

michelvocks
Copy link
Member

No description provided.

@michelvocks michelvocks added the Needs Work The PR still requires some work from the submitter. label Aug 1, 2019
server/server.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Sep 25, 2019

Codecov Report

Merging #201 into master will decrease coverage by 0.9%.
The diff coverage is 48.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
- Coverage      62%   61.09%   -0.91%     
==========================================
  Files          49       49              
  Lines        4066     4259     +193     
==========================================
+ Hits         2521     2602      +81     
- Misses       1132     1214      +82     
- Partials      413      443      +30
Impacted Files Coverage Δ
store/store.go 74.28% <ø> (ø) ⬆️
helper/pipelinehelper/pipelinehelper.go 100% <ø> (ø) ⬆️
store/pipeline.go 62.02% <0%> (-5.1%) ⬇️
workers/scheduler/scheduler.go 60.45% <10%> (-4.99%) ⬇️
workers/pipeline/update_pipeline.go 52.54% <100%> (ø) ⬆️
workers/pipeline/build_cpp.go 95.12% <100%> (ø) ⬆️
workers/pipeline/build_golang.go 94.64% <100%> (ø) ⬆️
workers/pipeline/pipeline.go 94.49% <100%> (+2.36%) ⬆️
workers/pipeline/build_nodejs.go 100% <100%> (ø) ⬆️
workers/pipeline/build_java.go 89.13% <100%> (ø) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13f272f...424cb98. Read the comment docs.

@michelvocks michelvocks marked this pull request as ready for review October 7, 2019 06:29
@michelvocks michelvocks requested a review from Skarlso October 7, 2019 06:29
@michelvocks michelvocks added needs review Ready To Merge PR is ready to be merged into master and removed Needs Work The PR still requires some work from the submitter. labels Oct 7, 2019
@michelvocks
Copy link
Member Author

@Skarlso This is ready for review. I still have to work on the documentation part which might be helpful for the review 😄

@Skarlso
Copy link
Member

Skarlso commented Oct 7, 2019

Great job! I'll review it. 😊

server/server.go Show resolved Hide resolved
Comment on lines +471 to +476
item := iter.Next()

if item == nil {
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this check into the for?

for item := iter.Next(); item != nil {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will change that! 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually doesn't work. item := iter.Next() is only executed once before loop iteration.

Copy link
Member

@Skarlso Skarlso Dec 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true I means to say item := node; item != nil; item = iter.Next.

@Skarlso
Copy link
Member

Skarlso commented Oct 8, 2019

Sorry, only picking through this slowly. But I'll get there. :)

if err := s.memDBService.InsertDockerWorker(worker); err != nil {
gaia.Cfg.Logger.Error("failed to cache docker worker in memdb", "error", err)
if err := worker.KillDockerWorker(); err != nil {
gaia.Cfg.Logger.Error("failed to kill docker worker", "error", err)
Copy link
Member

@Skarlso Skarlso Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like simply logging this fact is weird. At the least we should wrap it and pass it up, don't you think? Or provide some context, I don't know. If we just log it, I feel like we simply don't care and might as well not log it. Especially since Gaia is very verbose now and this might be just missed. I might entirely miss this, unless I grep for it. And even then I might just not care since this already errorred out because of a different reason I'm trying to solve / fix which will probably fix the docker worker as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the least we should wrap it and pass it up, don't you think?

Can you provide an example? If you mean with "pass it up" returning the error then this is not possible. We are in the schedule function which is periodically triggered and doesn't allow a return parameter.

The reason why I added this output is only for debugging purposes. What would you expect to happen here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the scheduler never returns an error. I guess it's okay, it's already in a failing statement.

@michelvocks
Copy link
Member Author

Related Docs PR: gaia-pipeline/docs#14

gaia.Cfg.Logger.Info("try to pull docker image and then try it again...")

// Pull worker image
_, err = cli.ImagePull(ctx, workerImage, types.ImagePullOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth mentioning that right now this only supports public docker registry and no private registries. I guess we can add that functionality later on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, exactly. I wouldn't prioritize this yet since we can easily add this at a later point in time.


// KillDockerWorker disconnects the existing docker worker and
// kills the docker container.
func (w *Worker) KillDockerWorker() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not start the container with --rm? I'm sure the api can do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know --rm does not work with detached containers (e.g. -d). Additionally, I have to stop the container anyway so that the container would be removed by the --rm.

@Skarlso
Copy link
Member

Skarlso commented Oct 11, 2019

@michelvocks So I wanted to test this now, and I found two bugs...

First, when I created the pipeline and enabled the docker thing on the pipeline creation ui, it wasn't enabled for the pipeline. The indicator on the pipeline view said disabled and in fact it did not run in a container.

Screenshot 2019-10-11 at 9 27 22

Once I enabled this though... It failed to run unfortunately by containously spewing this:

2019-10-11T09:29:30.006+0200 [INFO ] Gaia: try to pull docker image and then try it again...
2019-10-11T09:29:31.864+0200 [ERROR] Gaia: failed to create worker container after pull: error="Error response from daemon: No such image: gaiapipeline/gaia:latest"
2019-10-11T09:29:31.864+0200 [ERROR] Gaia: failed to setup docker worker for pipeline run: error="Error response from daemon: No such image: gaiapipeline/gaia:latest"
2019-10-11T09:29:33.007+0200 [ERROR] Gaia: failed to create worker container: error="Error response from daemon: No such image: gaiapipeline/gaia:latest"
2019-10-11T09:29:33.007+0200 [INFO ] Gaia: try to pull docker image and then try it again...
2019-10-11T09:29:35.262+0200 [ERROR] Gaia: failed to create worker container after pull: error="Error response from daemon: No such image: gaiapipeline/gaia:latest"
2019-10-11T09:29:35.262+0200 [ERROR] Gaia: failed to setup docker worker for pipeline run: error="Error response from daemon: No such image: gaiapipeline/gaia:latest"

Also, this made me realise that there is no timeout on this thing. :D This can keep failing and retrying indefinitely.

Not sure what's wrong because gaiapipeline/gaia:latest is an existing image. But it failed to pull it. I think it didn't block and wait for pulling the image?

@michelvocks
Copy link
Member Author

michelvocks commented Oct 11, 2019

First, when I created the pipeline and enabled the docker thing on the pipeline creation ui, it wasn't enabled for the pipeline. The indicator on the pipeline view said disabled and in fact it did not run in a container.

Let me try to reproduce this.

Once I enabled this though... It failed to run unfortunately by containously spewing this:
Not sure what's wrong because gaiapipeline/gaia:latest is an existing image. But it failed to pull it. I think it didn't block and wait for pulling the image?

It should wait for the pull to finish but I will test that again. However, please keep in mind that we didn't release a new version (and also not a new docker image). That means the current Gaia version in the latest image has not the worker stuff included and will therefore not work. I manually build a image from master-branch and used that for testing.

Also, this made me realise that there is no timeout on this thing. :D This can keep failing and retrying indefinitely.

Right. I thought about that too and I personally think that we maybe don't want a time out? e.g. try it as long as possible strategy.

@Skarlso
Copy link
Member

Skarlso commented Oct 11, 2019

The problem with as long as possible is that this continued on until I killed Gaia. Which means the pipeline is blocked until a Gaia restart occurs. I don't think that's a good idea.

True, the image wouldn't have the new worker stuff, but that's not the error. :) The error is that the image is not found. Which is weird because the image IS there. Did you try pushing your image up somewhere and deleting it locally and trying to run it like that?

@michelvocks
Copy link
Member Author

michelvocks commented Dec 2, 2019

@Skarlso

First, when I created the pipeline and enabled the docker thing on the pipeline creation ui, it wasn't enabled for the pipeline. The indicator on the pipeline view said disabled and in fact it did not run in a container.

I tried to reproduce this but it somehow works for me. Can you just give it another try? Maybe you clicked in the create pipeline UI on the cancel button? 😄

True, the image wouldn't have the new worker stuff, but that's not the error. :) The error is that the image is not found. Which is weird because the image IS there. Did you try pushing your image up somewhere and deleting it locally and trying to run it like that?

There was a bug in the image pull code. Should now work 👍

The problem with as long as possible is that this continued on until I killed Gaia. Which means the pipeline is blocked until a Gaia restart occurs. I don't think that's a good idea.

I'm wondering if this is actually a problem in general and should be addressed in a different PR. I mean this can also happen for normal pipeline runs, right? I think we should add a "context" to a pipeline run with a timeout which makes sure after a certain time a pipeline will be canceled?

@Skarlso
Copy link
Member

Skarlso commented Dec 3, 2019

yo. :) I'll re-try as soon as possible! :)

@Skarlso
Copy link
Member

Skarlso commented Dec 3, 2019

I'm wondering if this is actually a problem in general and should be addressed in a different PR. I mean this can also happen for normal pipeline runs, right? I think we should add a "context" to a pipeline run with a timeout which makes sure after a certain time a pipeline will be canceled?

Agreed!

@Skarlso
Copy link
Member

Skarlso commented Dec 7, 2019

going to try this now on a fresh laptop. :)

@Skarlso
Copy link
Member

Skarlso commented Dec 7, 2019

eh, couldn't build the image...

E: Version '8u181-b13-2~deb9u1' for 'openjdk-8-jdk' was not found
E: Version '20170531+nmu1' for 'ca-certificates-java' was not found

@Skarlso
Copy link
Member

Skarlso commented Dec 7, 2019

These new versions worked:

ENV JAVA_VERSION 8u232
ENV JAVA_DEBIAN_VERSION 8u232-b09-1~deb9u1

# see https://bugs.debian.org/775775
# and https://github.com/docker-library/java/issues/19#issuecomment-70546872
ENV CA_CERTIFICATES_JAVA_VERSION 20170929~deb9u3

How do you build your image?

@Skarlso
Copy link
Member

Skarlso commented Dec 7, 2019

Okay, obviously you can't just build this thing. :D

COPY failed: stat /var/lib/docker/tmp/docker-builder354759785/docker/settings-docker.xml: no such file or directory

Ah I see. You expected it to be built from the main folder.

@Skarlso
Copy link
Member

Skarlso commented Dec 7, 2019

Oh buuu

Step 46/51 : COPY gaia-linux-amd64 /app
COPY failed: stat /var/lib/docker/tmp/docker-builder648944948/gaia-linux-amd64: no such file or directory

@michelvocks michelvocks mentioned this pull request Dec 10, 2019
@michelvocks
Copy link
Member Author

It feels like every 2-3 month the newest Java version is already deprecated again and it is not possible to build the docker image again. I think we just have to regularly bump the version.

Usually, the docker image build process is started by goreleaser that is why it is expected to be in the root folder.

@Skarlso
Copy link
Member

Skarlso commented Jan 28, 2020

@michelvocks I had a play with this last night and I feel like it's doing what it's supposed to do. :D The building is clunky, true. :/ I have yet to find a good solution for that. We can create a ticket for that maybe and track it?

So, LGTM! :)

@Skarlso
Copy link
Member

Skarlso commented Feb 14, 2020

@michelvocks How about we ditch the pinning to a specific Java 8 open jdk version? I don't see any problems with trying to use whatever latest stable is in there, do you?

So this, works perfectly fine:

	apt-get install -y --no-install-recommends \
		openjdk-8-jdk \
		ca-certificates-java \

\o/

Successfully built 8f528061c799
Successfully tagged gaiapipeline/gaia:latest

@michelvocks
Copy link
Member Author

@Skarlso Sounds good. I will do that in a separate PR.

@michelvocks michelvocks merged commit b2dfb95 into gaia-pipeline:master Feb 14, 2020
@michelvocks michelvocks deleted the docker_pipeline_run branch February 14, 2020 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review Ready To Merge PR is ready to be merged into master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants